Sharing Data and Work Across Concurrent Analytical Queries

نویسندگان

Iraklis Psaroudakis

Manos Athanassoulis

Anastasia Ailamaki

چکیده

Today’s data deluge enables organizations to collect massive data, and analyze it with an ever-increasing number of concurrent queries. Traditional data warehouses (DW) face a challenging problem in executing this task, due to their query-centric model: each query is optimized and executed independently. This model results in high contention for resources. Thus, modern DW depart from the querycentric model to execution models involving sharing of common data and work. Our goal is to show when and how a DW should employ sharing. We evaluate experimentally two sharing methodologies, based on their original prototype systems, that exploit work sharing opportunities among concurrent queries at run-time: Simultaneous Pipelining (SP), which shares intermediate results of common sub-plans, and Global Query Plans (GQP), which build and evaluate a single query plan with shared operators. First, after a short review of sharing methodologies, we show that SP and GQP are orthogonal techniques. SP can be applied to shared operators of a GQP, reducing response times by 20%-48% in workloads with numerous common sub-plans. Second, we corroborate previous results on the negative impact of SP on performance for cases of low concurrency. We attribute this behavior to a bottleneck caused by the push-based communication model of SP. We show that pull-based communication for SP eliminates the overhead of sharing altogether for low concurrency, and scales better on multi-core machines than push-based SP, further reducing response times by 82%-86% for high concurrency. Third, we perform an experimental analysis of SP, GQP and their combination, and show when each one is beneficial. We identify a trade-off between low and high concurrency. In the former case, traditional query-centric operators with SP perform better, while in the latter case, GQP with shared operators enhanced by SP give the best results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sharing data and work across queries in analytical workloads

Traditionally, query execution engines in relational databases have followed a query-centric model: They optimize and execute each incoming query using a separate execution plan, independent of other concurrent queries. For workloads with low contention for resources, or workloads with short-lived queries, this model makes the optimization phase faster and creates efficient execution plans. For...

متن کامل

Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries

This paper focuses on the problem of improving distributed query throughput of the RDBMS-based data integration system that has to inherit the query execution model of the underlying RDBMS: execute each query independently and utilize a global buffer pool mechanism to provide disk page sharing across concurrent query execution processes. However, this model is not suitable for processing concur...

متن کامل

Simultaneous Query Pipelines in QPipe

Data warehousing and scientific database applications operate on massive datasets and are characterized by complex queries accessing large portions of the database. Concurrent queries often exhibit high data and computation overlap, e.g., they access the same relations on disk, compute similar aggregates, or share intermediate results. Unfortunately, run-time sharing in modern database engines ...

متن کامل

MQJoin: Efficient Shared Execution of Main-Memory Joins

Database architectures typically process queries one-at-a-time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper we propose Many-Query Join (MQJoin), a novel method for...

متن کامل

A Parallel Processing Strategy for Evaluating Recursive Queries

The set of resolvents generated by a recursive intension in a lirst-order database is treated as a set of concurrent database queries. A strategy for egiciently ev,aluating these concurrent queries in a multi-processor environment is presented. The strategy combines three query processing techniques, namely, query decomposition, intermediate result sharing and data-flow and pipelined query exec...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 6 شماره

صفحات -

تاریخ انتشار 2013

Sharing Data and Work Across Concurrent Analytical Queries

نویسندگان

چکیده

منابع مشابه

Sharing data and work across queries in analytical workloads

Request Window: an Approach to Improve Throughput of RDBMS-based Data Integration System by Utilizing Data Sharing Across Concurrent Distributed Queries

Simultaneous Query Pipelines in QPipe

MQJoin: Efficient Shared Execution of Main-Memory Joins

A Parallel Processing Strategy for Evaluating Recursive Queries

عنوان ژورنال:

اشتراک گذاری